7 - Deep Learning - Feedforward Networks Part 2 [ID:16752]

50 von 112 angezeigt

Welcome to deep learning. So in this little video we want to go ahead and look into some basic functions of neural networks and in particular we want to look into the softmax function and look into some ideas how we could potentially train the deep networks.

Okay, so let's start. Activation functions for classifications. Now so far we have

described the ground truth by labels minus one plus one but of course we

could also have classes zero one. So this is really only a thing of definition if

we only do a decision between two classes. But if you want to go into more

complex cases you want to be able to classify multiple classes. So in this

case you probably want to have an output vector and here you have essentially one

dimension per class k. So k, capital K here is the number of classes and you can

then define a ground truth representation as a vector that has all

zeros except for one position and that is the true class. So this is also called

one-hot encoding because all of the other parts of the vector are zero and

only a single one has a one. And now you try to compute a classifier that will

produce a respective vector and with this vector y hat you can then go ahead

and do the classification. So it's essentially like guessing a output

probability for each of the classes. In particular for multi-class problems this

has been shown to be a more efficient way of computing these problems. Now the

problem is you want to have a kind of probabilistic output towards zero and

one but we typically have some arbitrary input vector x and that could be

arbitrarily scaled. So in order to produce now our predictions we employ a

trick and the trick is that we use the exponential function. So this is very

nice because the exponential function will map everything into a positive

space and now you want to make sure that the maximum that can be achieved is

exactly one. So you do that for all of your classes. So you compute the sum over

all of the exponentials of all input vectors or of all input elements, use the

exponential function on them, sum them up and this gives you the maximum that can

be attained by this conversion and you divide by this number for all of your

given inputs and this will always scale to a zero one domain and it will have

the property that if you sum up all elements of the vector it will equal to

one. This is very nice because these are two axioms of the probability

distribution introduced by Kolmogorov. So this allows us to treat the output of

the network always as kind of probabilities and if you look in

literature or also in software examples sometimes the softmax function is also

known as the normalized exponential function. So it's the same thing. Now

let's look at an example. So let's say this is our input to our neural network

so you see this small image on the left. Now you introduce labels for this

three-class problem. Wait there's something missing.

It's a four-class problem. So you introduce labels for this four-class

problem and then you have some arbitrary input that is shown here in the column

xk. So they are scaled from minus 3.44 to 3.91. This is not so great so let's use

the exponential function. Now everything is mapped into positive numbers and

there's quite a difference now between the numbers. So we need to rescale them

and you can see the highest probability is of course returned for heavy metal in

this image. So let's go ahead and also talk a bit about loss functions. So the

loss function is a kind of function that tells you how good the prediction of a

network is and a very typical one is the so-called cross entropy loss and it's

the cross entropy that is computed between two probability distributions. So

you have your ground truth distribution and the one that you're estimating and

then you can compute the cross entropy in order to determine how well they are

connected, how well they align with each other and then you can also use this

into a loss function. Here we can use the property that all of our elements will be

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:11:22 Min

Aufnahmedatum

2020-05-28

Hochgeladen am

2020-05-28 19:46:31

Sprache

en-US

Deep Learning - Feedforward Networks Part 2

This video introduces the topics of activation functions, loss, and the idea of gradient descent.

Music Reference:
The One They Fear - The Dawn

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren